Generating Automatic Keywords for Conversational Speech ASR Transcripts
نویسندگان
چکیده
While a plethora of conversational speech has been recorded and archived for over a century, it has not been easily accessible due to many technical challenges vs. text and rehearsed speech to be addressed before conversational archives can be effectively searched and used. In this paper, we describe two language modeling methods for automatically assigning keywords to automatic speech recognition (ASR) transcripts, to benefit search and browsing of conversational speech archives. Experiments performed with the English CLEF CL-SR MALACH collection of oral history interviews. In comparison to a prior baseline generating 20 keywords per conversation segment, we use 1/20th the training data yet improve Recall@20 in matching manual keywords. However, while indexing of manual keywords yields improved search accuracy, indexing automatic keywords (ours or the baseline) fails to improve search accuracy, evidencing the need for additional research.
منابع مشابه
Acoustic Model Training with Detecting Transcription Errors in the Training Data
As the target of Automatic Speech Recognition (ASR) has moved from clean read speech to spontaneous conversational speech, we need to prepare orthographic transcripts of spontaneous conversational speech to train acoustic models (AMs). However, it is expensive and slow to manually transcribe such speech word by word. We propose a framework to train an AM based on easy-to-make rough transcripts ...
متن کاملA lightweight keyword and tag-cloud retrieval algorithm for automatic speech recognition transcripts
The Fraunhofer IAIS AudioMining system for vocabulary independent spoken term detection is able to provide automatic speech recognition (ASR) transcripts for audio-visual data. These transcripts can be used to search for information, e.g., in audio-visual archives. We experienced difficulties in the process of browsing for desired content when only these transcripts are given, especially since ...
متن کاملAutomatic Recognition of Emotionally Coloured Speech
Emotion in speech is an issue that has been attracting the interest of the speech community for many years, both in the context of speech synthesis as well as in automatic speech recognition (ASR). In spite of the remarkable recent progress in Large Vocabulary Recognition (LVR), it is still far behind the ultimate goal of recognising free conversational speech uttered by any speaker in any envi...
متن کاملThe fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines
The CHiME challenge series aims to advance robust automatic speech recognition (ASR) technology by promoting research at the interface of speech and language processing, signal processing, and machine learning. This paper introduces the 5th CHiME Challenge, which considers the task of distant multimicrophone conversational ASR in real home environments. Speech material was elicited using a dinn...
متن کاملAn empirical analysis of word error rate and keyword error rate
This paper studies the relationship between word error rate (WER) and keyword error rate (KER) in speech transcripts and their effect on the performance of speech analytics applications. Automatic speech recognition (ASR) systems are increasingly used as input for speech analytics, which raises the question of whether WER or KER is the more suitable performance metric for calibrating the ASR sy...
متن کامل